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FROM LARGE DEVIATIONS TO WASSERSTEIN GRADIENT FLOWS 

IN MULTIPLE DIMENSIONS 

MATTHIAS ERBAR, JAN MAAS AND MICHIEL RENGER 


Abstract. We study the large deviation rate functional for the empirical distribution 
of independent Brownian particles with drift. In one dimension, it has been shown 
by Adams, Dirr, Peletier and Zimmer [ADPZll] that this functional is asymptotically 
equivalent (in the sense of P-convergence as the time-step goes to zero) to the Jordan- 
Kinderlehrer-Otto functional arising in the Wasserstein gradient flow structure of the 
Fokker-Planck equation. In higher dimensions, part of this statement (the lower bound) 
has been recently proved by Duong, Laschos and Renger, but the upper bound remained 
open, since the proof in [DLR13] relies on regularity properties of optimal transport maps 
that are restricted to one dimension. In this note we present a new proof of the upper 
bound, thereby generalising the result of [ADPZll] to arbitrary dimensions. 


1. Introduction 

In the recent paper [ADPZll], Adams, Dirr, Peletier and Zimmer nnveiled a fnnda- 
mental connection between two seemingly unrelated aspects of diffusion equations. They 
connected the large deviation rate functional for the empirical measure of a system of 
independently diffusing particles to the entropy gradient flow structure of diffusion equa¬ 
tions in the Wasserstein space of probability measures. Let us informally describe these 
two concepts and their connection here, before giving rigorous statements in Section 2. 

Large deviations for independently diffusing particles. We consider n indistin¬ 
guishable particles evolving according to the stochastic differential equations 

dW(t) = -V^(W(t)) dt + \/2 dWflt) , (1) 

where (lUi(t),..., Wn{t))t>Q is a collection of independent standard M'^-valued Brownian 
motions. We assume that —)■ M is twice continuously differentiable and that its 

Hessian is uniformly bounded from below. Let := n~^ '^1=1 denote the empirical 

measure of If the initial values Ai(0) are chosen deterministically such that 

converges weakly to some hxed measure po G P(M'^), then, for each f > 0, it is a classical 
result that the empirical measure p^fl^ converges almost surely to the unique solution of 
the Fokker-Planck equation 

dipt = Apt + div(ptVd^) (2) 

with initial condition pq, see, e.g., [DG87, FK06] for much stronger results. Under suitable 
growth conditions on \k, a Sanov-type theorem implies that the random measures 
satisfy a large deviation principle of the form 

P[p” ^ p] ~ exp {-nIt{p\po)) , 
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where the rate functional is given by 

h{p\po)-= inf i/(7|poj) , (3) 

7Gr(po,p) 

see [Leo07, Proposition 3.2] and [PRV13, Theorem A.lj. Here, po,t ^ x M.^) denotes 
the joint law of a solution (Xo,Xt) to ( 1 ) with random initial condition Xq ~ po (inde¬ 
pendent of the Brownian motion), i7(-|po,t) denotes the relative entropy with respect to 
Po,t, and P(po, p) is the set of probability measures 7 G P(M'^ x M'^) with marginals po and 
p. For background on large deviation theory we refer the reader to [DZ98, FK06]. 

In this paper we are interested in the short-time behaviour of the rate functional It{-\po) 
and its relation to the Wasserstein gradient structure of the Fokker-Planck equation. 


The Wasserstein gradient structure of the Fokker-Planck equation. A seminal 
result by Jordan-Kinderlehrer-Otto [JK098] asserts that the Fokker-Planck equation (2) 
can be regarded as the gradient flow equation of the relative entropy 


T{p) := 


/ p{x) log p{x) dx + 


d/{x)p{x) dx p(dx) = p(x) dx , 


-|-oo 


otherwise , 


in the Wasserstein space of probability measures ('P 2 (®‘^), WA)- This result can be rig¬ 
orously interpreted in different ways, e.g., using the theory of gradient flows in metric 
spaces, or using an inhnite-dimensional Riemannian structure on the space of probabil¬ 
ity measures; see [AGS08] for details. Here we present the original interpretation from 
[JK098] in terms of the convergence of a discrete “minimizing movement” scheme, which 
can be seen as an analogue of the implicit Euler scheme for the gradient flow equation. 
For Po £ P2(®'^) and t > 0 , dehne Jt(-|po) : ^’2(1^'^) — )■ K U {-foo} by 

MpIPo) ■= J^ip) - J^iPo) + ^W 2 {po,pf , and set ^^[po] := argmin Jt(p|po) . (4) 

peP(R<i) 


Since this minimisation problem has a unique minimiser, S't[po] is well dehned. The JKO- 
functional Jt can be used to construct an iterative discretisation scheme: it was shown in 
[JK098] that 

pt := lim (S'i/„)''[po] 

'll _' 


exists for each t > 0 and satishes the Fokker-Planck equation (2). 


Relating R and Jt. The main result of [ADPZll] unveils a relation between the large 
deviation principle and the Wasserstein gradient flow structure. Roughly speaking, it 
asserts that the functionals R and \Jt are asymptotically equivalent as t —?■ 0. More 
precisely, it was shown that 

h{-\pQ) - pof ^--J^{p) as t-)■ 0 , (5) 

in the sense of P-convergence. This result provides an appealing microscopic explanation 
for the emergence of the Wasserstein gradient flow structure at the macroscopic level. 

The proof of this theorem in [ADPZll] required two strong technical assumptions. 
Firstly, the result was limited to one space dimension. Secondly, the proof required 
highly restrictive regularity assumptions on the involved measures. 
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In a subsequent paper [DLR13], Duong, Laschos and Renger were able to remove the 
strong regularity assumptions. Their approach is based on a different representation of 
the rate functional R due to Dawson and Gartner [DG87] (see also [FK06]), that we 
shall describe in Section 2. The proof of the lower bound in the T-convergence result in 
[DLR13] is valid in arbitrary dimensions. However, the remaining part of the argument 
(the construction of a recovery sequence) is restricted to one dimension, since it relies on 
regularity estimates for optimal transport maps which are known to be false in multiple 
dimensions. 

In this note we shall provide a different argument for the construction of a recovery 
sequence that works in arbitrary dimensions. Combined with the result from [DLR13], 
this completes the proof of (5) in arbitrary dimensions. We refer to Theorem 2.2 below 
for a precise statement. 

Structure of the paper. In Section 2 we give a detailed statement of the main conver¬ 
gence result. In Section 3 we collect well-known results about Wasserstein gradient flows 
that will be used in the proof. Section 4 contains the proof of the convergence result. 
For completeness, we also include the proof of the lower bound taken from [DLR13]. In 
the appendix we provide a short proof of the equivalence of different formulations of the 
Benamou-Brenier formula. 


2. Statement of the main result 

In this section we shall rigorously introduce the three objects appearing in the main 
result of this paper: the Wasserstein metric IF 2 , the relative entropy functional and 
the large deviation rate functional R. 


The Wasserstein metric. Let 'P 2 (K'^) := {p G V(W^) : j\x\‘^ p{dx) < cxo} denote the set of 
probability measures with hnite second moment. The L^-Wasserstein distance between 
£ 7 ^ 2 ( 1 ^'^) is dehned by 

W 2 {pQ,pi)-.= inf ( / \x-y\^'K{dx,dy) 

7rer(po,pi) \j^dy,^d 

where the inhmum is taken over all couplings tt of po and pi, i.e., F(po,pi) denotes the 
collection of all tt G V{W^ x M'^) with 7r(- x M'^) = po(-) and 7r{W^ ^ ') = Pi(')- 



The relative entropy. Throughout this paper we assume that T ; —)■ M is twice contin¬ 

uously differentiable and A-convex for some A G M, i.e., Hess\I'(a;) > Aid for all x G 
The relative entropy functional : V 2 {W^) —)■ M U {+cxo} is dehned by 


Hp) ■■= 


f{x) \ogf{x) dx-F 


T(x)/(x) dx if p(da:) = f{x) dx , 


-l-cxo 


otherwise . 


This functional is well-dehned, since the assumption on the second moment implies that 
the negative parts of / log / and T/ are integrable with respect to the Lebesgue measure. 
If p is absolutely continuous with respect to the Lebesgue measure, then dF can be written 
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as a relative entropy with respect to the equilibrium measure u{dx) = e dx. Namely, 

J^{p) = g{x)\ogg{x)du{x) , 

Js.‘i 

where p{dx) = g{x)h'{dx). 

We also introduce the relative Fisher information Q : —)■ [0, +oo] dehned by 

f |Vi?(x)p 


G{p) = 


J{g>0} 9{x) 

+ 00 


du{x) if p{dx) = g{x)u{dx), g G 
otherwise . 




The large deviation rate functional. The dehnition of the rate functional Ir involves a 
weighted Sobolev norm of negative order 1. Let TP = be the space of test 

functions and let TP' be the dual space of distributions. Given p G V{W^), we dehne the 
weighted iL“^(p)-norm of s G "D' by the duality formula 

{sjf 


1-1,p 


:= sup ■ 

fGV 


iv/rdp 


where the supremum runs over all smooth test functions f E TP for which the denominator 
does not vanish. Using the identity /a? = sup^g^ 2tb — one obtains the equivalent 
formula 

l|s||-i,p = sup |2(s, /) - y*! V/l^ dpj . 

For hxed po G ^’ 2 ( 1 ^'^) and r > 0, the functional Ir{-\po) : P 2 (®‘^) —t [0, +cxo] is dehned 
by 

Ir{p\po) ■= hif \\dtpt-TApt-Tdw{ptVd))\\‘^ dt, (6) 

(pt)teAc2(po,p) 4r Jo 

where AC^(po, Pi) denotes the set of 2-absolutely continuous curves {pt )iG[o,i] in (P 2 (ffi'^), W^ 2 ) 
with boundary conditions p\t=o = Po and p\t=i = Pi- We refer to Section 3 for the def¬ 
inition of 2-absolutely continuity. Intuitively, Ir{p\po) is the value of an optimal control 
problem, which requires to interpolate between po and p in such a way that deviations 
from the Fokker-Planck equation 


dipt = rApt rdiv(piVT) 


are minimised. 


Remark 2.1. Under two different sets of growth conditions on the potential T, coined 
‘subquadratic’ and ‘superquadratic’, the term inside the inhmum of (6) is the large devia¬ 
tion rate functional for trajectories [0, r] —?■ 'P{W^) of the empirical measure of independent 
particles, see [DG87]. Using the contraction principle, it was proved in [DLR13, Gor. 4.10] 
that the large deviation rate functional for the empirical measure at the end time r is 
obtained by taking the inhmum over (1-)absolutely continuous curves in (P 2 (l!^‘^), UA) 
with the right boundary conditions. In the subquadratic case, it follows from the proof 
of [DLR13, Prop. 4.6] that if po ^ ^’ 2 ( 1 ^'’*) and J^(po) < 00 , any weakly continuous curve 
with J^WdiPt — Apt — div(ptV\k)|p dt < 00 , is also 2-Wasserstein absolutely continuous. 
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In the superquadratic case, the same result was proved in [FN12, Lem. 2.1], Therefore, 
under both sets of conditions on T, we can take the inhmum over 2-absolutely continuous 
curves in (^’ 2 ( 1 ^'^), hence the large deviation rate functional (3) coincides with (6). 
In the rest of this paper we will not be concerned with the exact conditions under which 
these expressions coincide, but rather take (6) as the object of study. For more details, 
see [DLR13, Section 4]. □ 

Now we are ready to state the main theorem of this paper: 


Theorem 2.2 (Main result). Let T G he A -convex for some A G M. Then, for 

every pq G ^ 2 ( 1 ^'^) such that Q{po) < 00 , we have 


T / \ ^ ^iiPo, •) r , 1 ^ 1 T-/ ^ 

L( ■ I Po)- 


(7) 


in the sense of F -convergence. More precisely: 

(i) For any pi G ^ 2 ( 1 ^”) and any sequence {p[}r ^ ^’ 2 ( 1 ^'^) converging to pi in the 
2-Wasserstein metric, we have 


lim inf 

r —>-0 


r(p[ I Po) 


M^Kpo.pI) 


4r 






( 8 ) 


In addition, ifu{W^) = e < 00 , then the lower bound (8) also holds for 

any weakly converging sequence {p[}r ^ P 2 (K'^)- 
(ii) For any pi G P 2 (K'^) there exists a sequence {p'[}t ^ ^ 2 ( 1 ^'^) converging to pi in 
the 2-Wasserstein metric such that 


lim sup 

r^O 


7t(p[ I Po) 


W'Kpo.pI) 

4r 


< ^Hpi] 


^.^■(Po) 


(9) 


As discussed in the introduction, this theorem was hrst proved in dimension 1 in 
[ADPZll] under more restrictive conditions on the measures po and pi. Part (i) has 
been extended to arbitrary dimensions in [DLR13]. The novel contribution of our paper 
is a proof of (ii) in arbitrary dimensions. 

Remark 2.3. The right-hand side in (8) and (9) is well-dehned in M U {-|-cxo}, since 
our assumptions on po imply that J^(po) < oo- This is a consequence of the HWI- 
inequality by Otto and Villani [OVOO] (see also [Vil09, Corollary 20.13]), which asserts 
that Tip) < W^ip, - hWfip, u). □ 


3. Ingredients of the proof 


The Benamou—Brenier formula. It will be convenient to work with the dynamic char¬ 
acterisation of the Wasserstein distance due to Benamou-Brenier [BBOO], which asserts 
that, for po,Pi G P 2 (K'^), 


^2^(Po, Pi) 


inf 

(pt)tGAC2(po,Pi) 



( 10 ) 


For p > 1, recall that a curve (pt)tg[o,i] is said to be p-absolutely continuous with respect 
to W 2 , if there exists a scalar function m G L^(0, 1) satisfying IF 2 (Ps, Pt) < m(r) dr for 
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all 0 < s < t < 1. We use the notation G AC^(po,Pi)- If P = 1, we simply say that 
(pi)te[o,i] is absolutely continuous. In this case, the metric derivative 


I Pi 


11^2(Pi+fe, Pi) 
h — ^0 h 


exists for a.e. t G (0,1), see, e.g., [AGS08, Theorem 1.1.2] for more details. It can be 
shown that (10) implies the identity 


|Pi| = II ^iPi II -i.pt 


( 11 ) 


We refer to Appendix A for an equivalent formulation of the Benamou-Brenier formula 
which is commonly used in the literature on optimal transport and to [AGS08, Theorem 
8.3.1] for a proof of (10), (11) in this formulation. 


Relative entropy, Fisher information, and heat flow. A seminal result by McGann 
[McG97] asserts that the A-convexity of T implies displacement X-convexity of iF, see also 
[Vil03, Theorem 5.15]. This means that for any constant speed B^-geodesic {pt)te[o,i] ^ 
P 2 (®'^) and any t G [0, 1 ], we have 

T(p,) < (1 - ().F(p„) + tT{p,) - 1((1 - ()»V|(p„,p,) . (12) 

In particular, iF is flnite along geodesics as soon as it is flnite at the endpoints. The fact 
that the relative Fisher-information does not enjoy this property is the source of several 
complications in [DLR13]. We recall further that is lower semicontinuous with respect 
to hF 2 -convergence, see [AGS08, Remark 9.4.2 and Lemma 9.4.3]. 

The semigroup associated to the Fokker-Planck equation (2) will be denoted by {Pt)t>o- 
More precisely, for p G ^ 2 ( 1 ^'^) we set Ptp := pt, where {pt)t is the unique distributional 
solution to the Fokker-Planck equation (2) with po = p. This solution can be obtained 
using, e.g., the metric theory of gradient flows for (generalised) A-convex functionals, see 
[AGS08, Thm. 11.2.8]. 

In the following result we collect some well-known results on the behaviour of the 
semigroup {Pt)t>o- 

Lemma 3.1. The following assertions hold: 

(1) The curve t 1 —)■ Ptp is continuous on [0, cxo) and locally absolutely continuous on 
(0, cxo) with respect to IT 2 . 

(2) For all PjC E T’ 2 (®'^) o-nd allt >0 we have the contraction estimate: 

W 2 {PtP, Pta) < e-^*W 2 (p, a) . (13) 

Moreover, for any curve {ps)s that is absolutely continuous with respect to W 2 we 
have 

\\dsiPtps)\\-i,Ptps < e-^^\\dsPs\\-i,p, . (14) 

(3) For all p G 7^2 (IK'^) o,nd t > Q we have 

P{Ptp) < 00 , G{Ptp) < 00 , (15) 

as well as the bounds 

y^ip.p) < p(p) , s{p,p) < e-“‘e(p). 


( 16 ) 
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Finally, for any W 2 -geodesic (ps)se[o,i] < oo, we have as t \ O; 

TiPtips]) /' J^iPs) uniformly for s e [0,1] . (17) 

Proof. For part (1) and the properties (13), (15) and (16), see [AGS08, Theorems 11.2.1 
and 11.2.8]. The estimate (14) follows immediately from (13) and (11). It remains to 
prove the statement (17), which is less standard. Note hrst that by (12) we have that 

s i-A P{ps) is continuous and bounded. Our aim is to show that for every e > 0 there 

exists (5 > 0 such that P{ps) — lF{PtPs) < £ whenever t < 5 and s G [0,1]. Assume the 
contrary, i.e., that there exist £ > 0 and sequences —)■ 0 and (s^) C [0,1] such that for 
all k, 

- J^iPttPsk) > ( 18 ) 

By compactness we can assume that ^ Sq as fc —)■ oo for some Sq ^ [0, Ij- We claim 

that Ptf^Psk Pso ill IF 2 -distance as P —?■ oo. Indeed, again by (13) the triangle inequality 

yields 

^^2{Psqi PtkPsk) — ^'^2{pSQi PtkPso) P ^^2{PtkPsoi PtkPsk) 

< lV2{pso, PtkPso) + ^ ^^'°W2{Pso,Psk) ^ 

and the claim follows from the continuity of Pt at t = 0 and the continuity of the curve 
{ps). Passing to the limit P —)■ cxo in (18), using the continuity of s i-A P{ps) and the lower 
semicontinuity of P with respect to IF 2 , we obtain the following contradiction: 

0 = P{pso) - P{Pso) > limsup (P(p,j-P(Pi,p,j) > e, 

which completes the proof. □ 

We conclude this section by stating some useful identities for the derivative of the 
entropy. In fact, for any absolutely continuous curve (pt)tg[o,i] with P{pt) G M for all t 
and G{pt) dt < 00 we have that t i-A P{pt) is absolutely continuous with 

^P(pt) = -{dtpt,Apt + div{ptVdl))_^^^^ (19) 

for a.e. t G [0,1], see [DLR13, Lemma 2.3]. In particular, if pt satisfies the Fokker-Planck 
equation we have 

- = \\Ap, + div(ftV>i>)|iL,„ = Sip,) , (20) 

where the second equality follows from (26). 

4. Proof of the main result 

4.1. Upper bound. In this section we prove existence of the recovery sequence, i.e., 
statement (ii) of Theorem 2.2. For this purpose we define the set Q := {p G P 2 (R^) : 
G{p) < cxd}. Note that P{p) < 00 for all p G Q in view of Remark 2.3. Below we will 
prove the following two claims: 

Claim 4.1. For all po, pi E Q we have as r —)■ 0, 

k{pi I Po) — -^Pd^ipQ, Pi) -P -P{pi) — -P'(po) • 
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Claim 4.2. For every p G there exists a sequence {p"')n ^ Q such that p) —)■ 

0 and iF{p"') —)■ iF{p)- 

The existence of the recovery sequence then follows from a straightforward diagonal 
argument, see [DLR13, Proposition 6.2] for details. 


Proof of Claim f.l: We only need to prove the limsup inequality for the left-hand side 
of (21), since the liminf inequality will be proved in Section 4.2 below. If po = Pi the 
claim is immediate, so we take distinct measures po,Pi ^ Q, and take a geodesic (pt) 4 g[o,i] 
connecting po and pi. We will approximate this curve by running the semigroup for a 
small time £ = £(r) > 0, which will be determined below. A careful choice of e as a 
function of r is crucial for our argument. We thus consider the curve (pf)tg[op] defined by 


PtpQ , 0 < f < £ , 

pI = { PeP^ , £ <t <l - e , 

Pi-tpi , l-e<t<l. 

For the sake of brevity, we shall write Cp = Ap -|- div(pV\k). Using the definition of 
I Po) and the second identity (20), we obtain 


4r 


< 


4r 

1 

4r 


^ \\dtpl - TCpl\\\pe dt - Wf{po,pi)^ 

2 TTr2' ^ 


-PpI 


dt - W 2 (po. Pi) - A / (^tPt, PpD-iM dt + - / g{pl) dt . 


We shall estimate these three terms separately. Let ca, tcA > 0 be sufficiently large so that 


< 1 + k\e and e ^^Mt < c\e for all e G (0, i). Using the semigroup estimates 
(16) and (14) and the Benamou-Brenier formula (10), the first term can be bounded by 


-i.pf 


dt= / ||£pf|l_i,pedt-t- 


A / G{PtPD)dt + 


l-2e 

,-2A£ r 


\\dt{Pept)\\-i,P^p,dt+ / \\Cpl\\_^pedt 


1 

l — £ 


l-2e 


'0 


\\dtpt\Wp,dt+ / g{Pi-tpi)dt 


' l—£ 


A c\eg{po) -|- (1 -|- /cA£)hF2^(po; Pi) + c\eg{pi) . 


For the third term we use (16) to obtain 


/ g{pl) dt < cx£{G{po) + ^(pi)) + h{e) , where h{e) = / G{PePt) dt . 
JQ Jo 

We claim that h{e) is finite for each £ > 0. Indeed, using (16) and (20) we obtain 


g{PePt) / e2"(^-dds< / g{PsPt)ds = Pipt)-PiPePt) 


( 22 ) 


Jo Jo 

The right-hand side is uniformly bounded in t thanks to the A-convexity of P and the 
uniform convergence (17). Consequently, h(e) < 00 . 
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To treat the second term, we can thus use (19) to obtain 

{dtpl, Cpl)_i^pe dt = T{po) - -T(pi) . 

Combining these three bounds, we infer that 

^t{Pi I Po)-2 < -J^ipi) - --T(po) + + -'j (&(Po) + ^(Pi)) 

+ ^^2^(Po,Pi) + • 

We claim that e = eij) can be chosen as a function of r such that 

E { 

-)■ 0 and Th(e{T)) ^0 as r —)■ 0 . (23) 

r ^ ^ 

This yields the limsup inequality in (21). The corresponding liminf inequality will follow 
from ( 8 ). 

It thus remains to prove the claim (23). For e > 0 we set 

g{e) := \/eJh{Ej . 

Writing g[e) = it follows from (16) that g is strictly increasing on (0, £o) 

for So sufficiently small. Taking into account that h{0) > 0 since po 7 ^ pi, we note that 
hm£^o5'(^) = 0. To show that g is right-continuous, note that for each t G [0,1], the 
function Gi : e 1—)■ ^{Pept) is lower semicontinuous and non-negative, see e.g. [AGS08, 
Proposition 10.4.14]. Fatou’s lemma implies that h := f^Gtdt is lower semicontinuous as 
well. Hence g is upper semicontinuous and thus right-continuous, since it is also increasing. 
It follows from these properties that we can dehne 

:= 9 ~\r) := inf {e : g{e) > r} 

as the generalised inverse of g. We shall show that this function has the desired properties. 

Since g is right-continuous, we note that p(£(r)/ 2 ) < r < p(£(r)), which implies that 
the expressions in (23) can be estimated from above by 



It thus suffices to show that eh{e) —)■ 0 as e —)■ 0. To show this, note that e ^ ds > 

min{l,e^/^} =: kx for all e G (0, |). Therefore, (22) yields 

kxeQ{Pept) < P{,pt) - P'iPePt) ■ 

By (17) the right-hand side converges to 0 as e —)■ 0, uniformly for t G [0,1]. It follows 
that 

eh{e)=e f Q{P^pt)dt -)■ 0 
Jo 



as £ ^ 0, which completes the proof. 


□ 
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Proof of Claim f.2: We approximate p G ^ 2 ( 1 ^^) by applying the semigroup. The hrst 
inequality in (15) yield that P^p G Q for any £ > 0, and Lemma 3.1(1) implies that PeP 
approximates p in hL 2 -distance. Finally, since is lower semi continuous with respect to 
IF 2 , the convergence iF{Psp) iF{p) as e 0 follows from (16). □ 

4.2. Lower bound. For completeness, we reproduce here the short proof of statement 
(i) in Theorem 2.2, the lower bound, as given in [DLR13, Theorem 5.1, see erratum]. 

Proof. By dehnition of the inhmum in ( 6 ), there exists a sequence of absolutely continuous 
curves (p[)te[o,i] such that 

IApI \ Po) + r>^ \\dtp1 - t{Ap1 + div(p[ VT))||^^^^, dt. 

In particular, the right-hand side is hnite for all r > 0. Since {pl)t is assumed to be 
2 -absolutely continuous, we infer that ^IWdtplW^^i^p-r dt is hnite as well, and therefore 

[ g{pt)dt= f \\Apl+ dw{plVd/)\\\p.dt < 00 . 

Jo Jo 

It follows that that t JF{pl) is absolutely continuous and the identity (19) holds. Thus 
we can estimate 

r ( p [ \ po )+ T > J ; J \\ d , pl - T(Apl + div(p[ di 

” 47 / - ^ / (apI.Aft'' + div(p[V>I>))_i,p;d( 

+ ^ ||Apr + div(p[V*)||y_^,dt 
>iir 2 (p„,pl) + lj-(pl)-ij-(p„), 

where the last line follows from the Benamou-Brenier formula (10). The claim ( 8 ) then 
follows from the lower semicontinuity of JF with respect to hF 2 . 

< 00 , the result follows by applying the lower semicontinuity of T with 
respect to weak convergence in the hnal step. □ 

We hnish by remarking that in the statement of Theorem 2.2(i), the assumption of 
Wasserstein convergence cannot be weakened to weak convergence if the equilibrium 
measure v does not have hnite mass. A counterexample can be found in the erratum 
to [DLR13]. 

Appendix A. Equivalent formulations of the Benamou-Brenier formula 
The Benamou-Brenier formula in optimal transport asserts that for ^ ^ 2 (®^); 

{j(‘p.P.lf-,,p.di} . 


lU(Po,Pi) 


inf 

(pt)t 6 AC(po,P i) 


(24) 
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In this formula, the norm |||•|||_^p is defined by 

|||s|||^, := inf < [ |n(a:)P dpfx) : s + div(pn) = 01 . (25) 

■ v€L^p-,Rd) J 

for p G P(R'^) and s G T)'. It can be shown that the infimum in this dehnition is uniquely 
attained, and its minimiser can be characterised as follows: a solution v G to 

the “continuity equation” s + div(pn) = 0 is optimal in (25) if and only if it belongs to 
the space of generalised gradient vector helds dehned by 

Hp := {\/7p : ^ I i/; G . 

We refer to [AGS08, Section 8.4] for the proof of these facts. Note in particular that 

|||div(pVV')|||ii = f \V'il^{x)\^dp{x) (26) 

jRd 


whenever V^jJ G L^(p;M'^). 

The following lemma relates the norm |||■|||_l ^ to the norm 11-11-1,^ dehned in Section 2. 
Lemma A.l. Let p G P(M‘^) and s G T>'. Then ||s||_i^p = |||s|||_;^^. 

Proof. Suppose hrst that |||s|||_^^ < oo, and let v G L^(p;M'^) be the unique minimiser in 
the dehnition of |||s|||_i p- If |||s|||_;^ ^ = 0, it follows that v vanishes p-a.e., hence (s, /) = 0 
for all / G "D, which implies that ||s||_i^p = 0. Assume now, without loss of generality, 
that |||s|||^ip = /|npdp=l. Then, 


|s||-i,p = sup (-div(pn),/) 

fGV 


= sup 

f&V 


= sup < - 


|V/|Mp=l 

i 

|V/pdp = l 


/6C 


n • V/ dp 


|n|2 + |V/|2-|n-V/|Mp 


|V/|Mp=l 


f&V 


= sup < 1 — - / |n —V/pdp 


|V/pdp=l . 


Since v G Hp, it follows from this computation that ||s||_i,p = 1 = |||s|||_;^^. 

On the other hand, if ||s|l_i^p < oo, it follows from (s,/) < ||s||_i^p • || V/|| 2 , 2 (p.]Rd) that 
the mapping 


T;{V/:/gP}^M, Vf^{s,f) 


extends to a bounded linear functional T : {Hp, ||•||i 2 (p.]Kd)) —)■ M of norm ||s||_i^p. Hence, 
the Riesz representation theorem implies that (s, /) = • V/dp for some v E Hp with 

||i^||L 2 (p;Rd) = ||s||_i^p. It follows that |||s|||_ip < ||n||i:, 2 (p.Rd). In view of the hrst part of the 
proof, the latter inequality is in fact an equality. □ 


As a consequence of this lemma we infer that the Benamou-Brenier formulas in (10) 
and (24) are equivalent. 
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